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Abstract 

Surrogate Data Analysis (SDA) is a statistical hypothesis testing framework for 
the determination of weak chaos in time series dynamics. Existing SDA procedures 
do not account properly for the rich structures observed in stock return sequences, 
attributed to the presence of heteroscedasticity, seasonal effects and outliers. In 
this paper we suggest a modification of the SDA framework, based on the robust 
estimation of location and scale parameters of mean-stationary time series and a 
probabilistic framework which deals with outliers. A demonstration on the NAS- 
DAQ Composite index daily returns shows that the proposed approach produces 
surrogates that faithfully reproduce the structure of the original series while being 
manifestations of linear-random dynamics. 
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1 Introduction 



The search for nonUnear deterministic dynamics in stock market prices has 
been an intensive area ^ for research, and especially active in the recent years 
with the advances in Econophysics (j9|; llOT ). The accurate determination of 



stock return dynamics and their distributional properties is of main concern 
here, as they can significantly improve portfolio formation and risk evaluation 
practices, as well as allow the fine tuning of asset valuation procedures. 



There have been several indications that stock prices do not fluctuate as ran- 



domly as they should, according to t 



framework (see discussions in Ref. 



plex structures (|l 



i3; i]; 



Hi; 



le underlying theoretical equilibrium 



12; 



13 



14 ). and exhibit rich and com- 



20; 



21 



171). However, earlier research 
has not provided a clear answer towards the presence or absence of nonlin- 
ear determinism and chaos. Hence, the candidacy of deterministic chaos as 
an alternative hypothesis to randomness, has not enjoyed popularity among 
the ranks of economists. Limitations posed by the quantity and quality of 
data, computational power and the absence of a widely acceptable and ap- 
propriate theoretical and statistical framework, have also been factors that 
contributed to the dispute against chaotic dynamics in finance and economics 
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29|). However, a Monte-Carlo simulation-based 



statistical hypothesis testing framework for detecting weak chaos, appears to 



have been ignored by and large till recently in financ e (ISOt 
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and has preceded 



framework is called Surrogate Data Analysis (SDA: 
historically a significant amount of influential research of chaotic dynamics in 
finance and economics. 



SDA (see section 3 for details on how this methodology works) has been pri- 
marily designed to ensure the validity of results of investigations for nonlinear 
determinism and the presence of weak chaos. Similar investigations have been 
mainly focusing on the examination of invariant measures, such as dimension 
based statistics for the characterization of strange attractors 13). However 
such measures have been shown to provide misleading, biased or inconclusive 



results, due to the presence of noise or t 



le lack of sufficient observations in 



the data sets examined (j2J; |2^ |2^ 



271 : 1281 ). Though SDA can provide the 



means to bypass some of the limitations posed by the quahty and quantity 
of the sequences under examination, still the structure of their underlying 
dynamics and their noise content can pose serious considerations. The above 
discussion comes into context in the analysis of financial time series, where 
the nature of the data generating processes and the noise components are 
still largely unknown, while the "mechanics" and the equilibrium conditions 
of the market systems examined often appear empirically to be ill or loosely 
defined. Especially, the presence of heteroscedastic noise in stock returns and 



their nonstationary fluctuations among other stylized facts ()38r ). can mask the 
presence of low dimensional nonlinear determinism. As mentioned earlier, the 
greatest disadvantage of the nonlinear statistics based on invariant measures 
is their lack of power, especially in financial applications. SDA enables us to 
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bypass this limitation. However, heteroscedasticity may render this exercise 
useless, as the existing surrogate methods are designed for homoscedastic time 
series. Their application on noisy and heteroscedastic sequences may lead to 
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411 ). Since 



misleading results and biased or inaccurate conclusions 
SDA is essentially a simulation of the linear characteristics of a time series, 
it should be able to deal with heteroscedasticity, outliers and calendar effects, 
which are major features of financial time series. In this paper we demonstrate 
how to modify one of the most advanced and popular surrogate methods, the 
Iterative Amplitude Adjusted Fourier Transformed surrogates (lAAFT) (Q), 
in order to account for important stylized facts regarding heteroscedasticity, 
calendar effects and outliers in stock returns sequences. 



2 Dealing with heteroscedasticity and outliers 

A time series sequence is subject to heteroscedasticity when the variance is 
time-varying. Empirical research on stock returns has shown that from time 
to time the variance fluctuates, volatility appears to be clustering, while out- 
liers appear in the time series, often attributed to exogenous factors and ran- 
dom events. The use of robust statistics is justified for the identification and 
characterization of the underlying dynamics. Robust statistics were developed 
principally during the 70's with a few related but major methodologies ap- 
pearing the following decade. In this paper we make use of the Least Median 
of Squares (LMS) concept introduced by The LMS estimator min- 

imizes the median of the squared discrepancies rather than the mean, as in 
the Ordinary Least Squares (OLS) methodology. Hence, LMS estimators may 
produce results which are relatively immune to the presence of the outliers 
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and the non-normality of the errors' distribution. One disadvantage of LMS 
estimators is that they are considerably less efficient in the case of normally 
distributed errors. However, it is well established that the distributions of 
the first logarithmic differences of stock prices (i.e., logarithmic returns) fail 
normality tests, and exhibit strong leptokurtic features, and this justifies the 
applicability of the LMS concept. 

Since outliers may pose considerations under the SDA framework, it is nec- 
essary to follow a policy for their classification and characterization. For the 
purposes of our approach here, in order to isolate the outliers of a given data 
set we suggest the following steps in the spirit of the wider LMS literature: 

(1) Find the LMS location parameter of the data set: 

loc = argmin(med(a; — 6')^), (1) 



i.e., determine the value of a parameter 9 which minimizes the median 
of the squared deviations from the median. This can be easily achieved 
by sorting the data set and calculating the midpoint of the range of the 
50% of the densest data. 
(2) Find the LMS scale parameter of the data set: 



5 

scale = 1.4826 X (1 + — ^) x med(r2), (2) 



where r is the residuals' vector obtained from the previous step and the 
consistency constant 1.4826 comes from the square root of the median of 



the chi-square distribution with one degree of freedom ()43l ). Hence, this 



scale parameter can be calculated once the LMS location parameter 9 is 
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estimated. 

(3) Calculate the ZLMS-score: zlms = 
according to the LMS concept. 



(x — loc) /scale i.e., normalize the data 



Rousseeuw and Leroy (j4J) propose the following fuzzy model (see also Fig. 1) 
for determining the degree A of a residual not being an outlier: 



• If |zlms| ^ 2.0 then A = 1.0 and x is not an outlier, 

• if 2.0 < |zlms| < 3.0 then A = 3.0 — |zlms|, and x is not an outlier with 
degree A, and 

• if 3.0 < |zlms| then A = 0.0, and x is an outlier. 



[ Insert Fig. 1 about here. ] 



Our approach converts the above fuzzy model to probabilistic. In other words, 
every time we run the surrogate data algorithm we consider a probability equal 
to the degree A that a data point x is classified or not as an outlier. Thus, we 
classify as "outliers" values with a corresponding |zlms| score more than 3.0, 
and as non-outliers the values with a corresponding |zlms| score less than 2.0. A 
random number generator that produces uniformly distributed random values 
in [0,1] helps on the intermediate |zlms| scores (i.e., scores between 2 and 3). 
For example if a data point x has |zlms| score of 2.8, a corresponding random 
number of 0.2 or greater will classify it as an outlier, while a corresponding 
random number less than 0.2 will not classify it as one. 
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3 The methodology of the Probabihstic lAAFT surrogates 



The SDA methodology focuses on producing simulated sets from a sequence 
which capture only the linear properties of the original data. Then a discrim- 
inating pivotal statistic is chosen. Sufficient evidence for rejecting the null of 
linear stochastic dynamics is given when the value of the statistic calculated 
on the original data, differs significantly from its values obtained from the 
surrogate sets. The simulation procedures for generating surrogate data dif- 
fer according to the null being considered. For example, a simple reshuffiing 
of the original sequence can test for white noise, whereas more complicated 
reshuffiing exercises may test for linearly filtered noise or monotonic nonlinear 
transformations of linearly filtered noise. Usually the last case is regarded as 
the most interesting, as the other procedures may produce spurious results in 
the presence of linearly correlated noise that has been transformed by a static, 
monotone nonlinearity. The SDA technique is different to the Bootstrap ()45l ) 
as is refers to a constrained randomization simulation based hypothesis testing 



framework, found in permutation tests (|46|). 



To test for the original sequence being a monotonic nonlinear transformation of 
linearly filtered noise, one has to simulate surrogates according to the following 



steps ()4 71:1481): 



(1) Starting with the original sequence x, generate an individually and iden- 
tically distributed (i.i.d.) Gaussian data set y and reorder according to 
the ranking of x„. In this way we can rescale the original sequence to a 
normal distribution. 

(2) Produce the Fourier transform of the rescaled sequence y and assign a 
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random phase to each (positive) frequency. 

(3) Take the inverse transform of above step's sequence, say y* . This stage 
ensures that the surrogates will exhibit the same power spectrum as the 
originating sequence x. 

(4) Reorder the original data x to generate a surrogate Xg which will have the 
same rank distribution as y* . In this way we are certain that not only the 
spectrum but also the distribution of the original sequence x is preserved 
in Xg- 

The above surrogates are referred to as Amplitude Adjusted Fourier Trans- 
formed''^ surrogates or AAFT for sort. AAFT surrogates will have the same 
distributions and amplitudes with the original sequence but will not exhibit 
the same power spectra. To achieve the latter, an improved, iterative version 
of AAFT surrogates (termed lAAFT) has been proposed. To produce lAAFT 



(1) Apply a Fourier transform to the original sequence x and save the am- 
plitudes a. Produce a shuffled surrogate sequence x'^ from the original 
x, apply a Fourier transform to x'^ and preserve the phases 0. Finally, 
construct a vector r that contains the ranking of x. 

(2) Produce a phase randomized (AAFT) surrogate sequence x'!, combining 
a and 0. Compare the rank orders of x'!, and r. If these are the same, 
proceed to the next step, otherwise the vector r hosts the rankings of x'^', 
hosts the phases of x^', and the procedure of this step is repeated. This 
step can also be terminated if the maximum number of iterations defined 
by the user (e.g., 1000) is reached. Thus we avoid strong discrepancies 
between the surrogates and the original sequence's spectrum. 

(3) Force x"g to follow the distribution of x, by assigning on its indices the 



surrogates (j4; 




one has to follow the steps below: 
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corresponding values of x. 

The lAAFT surrogates ensure that the main hnear features of a time series 
will be faithfully preserved. However, the above procedure has been designed 
for stationary time series and therefore cannot cope with the presence of het- 
eroscedasticity and outliers. In other words and with respect to the classifi- 
cation produced in section 2, the lAAFT surrogates have been designed for 
time series where all the observations are subject to |zlms| < 2. According to 
the proposed framework in this paper and in order to take into account the 
outliers that are observed in stock returns, we have to modify the surrogate 
generating algorithm according to the following steps: 

(1) Calculate the LMS location parameter of the time series. 

(2) Calculate the LMS scale parameter of the time series. 

(3) Calculate the zlms for each observation. 

(4) Convert the zlms to A, according to section 2. 

(5) Create a new series of uniformly distributed random numbers in [0,1], say 
It, with length equal to the length of the original time series. 

(6) Create a new time series Xg, which contains all the values of x that cor- 
respond to Aj >Ui. 

(7) Apply the lAAFT surrogate algorithm to Xg. 

(8) The final surrogate sequence will preserve the values of x that correspond 
to \i < Ui, in exactly the same positions as in the original sequence, and 
will receive the surrogate of Xs for Aj > Wj, to fill the remaining gaps. 

Our experiments below show that according to the above procedure (termed 
Probabilistic I A AFT, or PI A AFT for short), the outliers, volatihty clustering 
and hence heteroscedasticity can be faithfully reproduced with a "reason- 



9 



able" probability, according to their level of presence in the original sequence. 
Moreover, the rest of the desirable properties of the lAAFT surrogates are 
preserved. 



4 Calendar Correction 

So far we have described a surrogates generation procedure which is able 
to account for heteroscedasticity. In this section we also demonstrate how to 
account for the calendar effects. As a first step we have to define what we imply 
here by the term "calendar effects" . Since there is no universal definition, we 
presume eight kinds of calendar effects. The first five effects, and the least 
important ones, are the five weekdays. Next and of greater importance, the 
first and last trading days of a month (day-of-month) are being considered as 
calendar effects. Finally, we have the holiday effect, which is also assumed here 
to be the most important. For example, if a trading day can be characterized as 
both a pre-holiday and end-of-month day, the holiday effect applies. Following 
the same rationale, if a trading day is both a Thursday and the first day of a 
trading month, it is classified according to the latter effect. 

In order to specialize the algorithm given in section 3, we have to reconsider 
its first 3 steps for the "calendar- wise" time series. To achieve it, we normalize 
(using the LMS parameters) every calendar-wise distribution. The rest of the 
steps are followed without any change, save for the 7th step which has to be 
adapted according to the calendar structure of the time series. This procedure 
is the Calendar Corrected version of the PIAAFT (hence CCPIAAFT) . 
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5 Empirical Results 

This section compares the surrogates produced by the proposed CCPIAFFT 
algorithm to the surrogates of the lAAFT algorithm. Our time series is the 
NASDAQ Composite Index, daily closings, from 5-Feb-1971 to 31-Dec-2003. 
There are totally 8311 observations. Since all the surrogates generating algo- 
rithms need the original time series to be at least mean stationary, we work 
with the first logarithmic differences of the daily closing prices (i.e., the con- 
tinuously compounded returns). 

[ Insert Fig. 2 about here. ] 
[ Insert Fig. 3 about here. ] 

As the Fig. 2 and 3 show, there is no need for specific statistical tests to realize 
the difference between the compared surrogate algorithms. The CCPIAAFT 
surrogates "imitate" extremely well the heteroscedasticity caused by volatility 
clustering in the original time series and the trend changes that are implied. 
In Fig. 4 and 5 we utilize the correlation integral js^, CI:) to demonstrate 
that the CCPIAAFT surrogates result a CI much more closer to the one of 
the original time series. 

[ Insert Fig. 4 about here. ] 
[ Insert Fig. 5 about here. ] 

Considering the lAAFT surrogates as our null hypothesis implies that we the- 
orize that extreme events (such as the oil crisis of 1973, the Black Monday of 
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1987 and the recent bubble of 2000) can occur with equal probabihty, a premise 
that voluminous research in finance has challenged so far. Certain events that 
trigger unanticipated changes, occur due to exogenous political and economic 
(and not necessarily) market dynamics. Therefore, if these unsystematic fiuc- 
tuations could be preserved, along with any other calendar effects, one could 
produce financial surrogates that faithfully reproduce certain market reali- 
ties. The linear correlations and the randomization of the returns should only 
affect the systematic components. Hence, CCPIAAFT surrogates essentially 
isolate the systematic from the unsystematic changes. The degree to which 
this is achieved is highhghted in Fig. 2 and 3. Fig. 6 and 7 also refer to various 
realizations of CCPIAAFT surrogates for comparison purposes. 

[ Insert Fig. 6 about here. ] 
[ Insert Fig. 7 about here. ] 

6 Conclusions 

In this paper we suggest a method which embodies the outliers and calen- 
dar effects on the production of surrogate data. In financial time series where 
heteroscedasticity, in the sense of volatility clustering, is the most striking fea- 
ture, the proposed method yields simulated sequences which are more similar 
to the original time series, when compared with other surrogate data generat- 
ing methods. Moreover, the proposed approach has the advantage of behaving 
as the lAAFT algorithm when no heteroscedasticity or calendar effects are 
present. We do not assume (G)ARCH volatility structures, however our strat- 
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egy can be modified to accommodate such a case. We reserve this as an area 
for future research. 
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Fig. 1. The model proposed by Rousseeuw and Leroy (1987) (^3) regarding the distinction between outhers and the bulk of the ob- 
servations, according to the |zlms| score. In this model A on the vertical scale represents the degree of a point not being an outlier. 
Observations with [zlms! < 1 ^ot considered outliers, and observations with |zlms[ > 3 are surely considered outliers. In between 
these two extremes, the degree falls linearly. 
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Fig. 2. The original time series (bottom) and 5 surrogates (from top to bottom): the shuffled surrogates (top), the phase randomized 
surrogates, the AAFT surrogates, the lAAFT surrogates and the CCPIAAFT surrogates. It is evident that the CCPIAAFT series preserve 
the saUent features of the original sequence, especially the volatility clustering and the outliers (shocks) which are linked to well known 
historical events such as the crash of 1987 and the uncertainty after the burst of the more recent financial bubble. 




Fig. 3. The levels of the time series shown in Fig. (2). The CCPIAAFT surrogate series levels (2nd from bottom) preserve exactly 
the trends that the original time series exhibit, while the all the other sequences above follow a general trend with no time-specific 
characteristics. 
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. 4. The correlation integral on the series of Fig. (2) with embedded dimensions 
m = 2 and (b) m = 3. 
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Fig. 5. The logarithm of the norm-2 difference between the correlation integral of the original time series and the surrogates, shown in 
Fig. (2). We observe that in both cases the CCPIAAFT surrogates show the smallest difference compared to their counterparts, implying 
that the CCPIAAFT surrogates provide improved simulations of the original time series. 




Fig. 6. A comparison of the original time series and 4 CCPIAAFT surrogate series. Which one is the original? (Answer: the 4th from 
above). 
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Fig. 7. The levels of the series shown in Fig. (6). In this graph the differentiation from the original time series is obvious in very few 
specific time domains. More precisely, we can observe that the drop of the index related to the 1974 crisis and the increase related to the 
2000 bubble, appear to be smoother in all surrogate series. This is attributed to the small daily changes in each case being considered as 
part of the normal fluctuations of the original time series by the CCPIAAFT procedure. 



